Study of Biological Sequence Clustering
نویسنده
چکیده
[Internal Report] Saliya Ekanayake School of Informatics and Computing, Indiana University [email protected] ABSTRACT Determination of biologically related clusters of sequences is important bioinformatics analyses. The similarity between sequences is generally assessed based on their alignments with one another. This could be used with a clustering algorithm to determine groups of sequences, yet it is not straightforward how to get reliable results. We present the factors affecting the quality of clusters and how visualization aids in the refinement of results. We also present a way to verify clusters in the presence of consensus sequences, and represent clusters.
منابع مشابه
A computational method to analyze the similarity of biological sequences under uncertainty
In this paper, we propose a new method to analyze the difference and similarity of biological sequences, based on the fuzzy sets theory. Considering the sequence order and some chemical and structural properties, we present a computational method to cluster the biological sequences. By some examples, we show that the new method is relatively easy and we are able to compare the sequences of arbi...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملMolecular Typing of Mycobacterium Tuberculosis Isolated from Iranian Patients Using Highly Abundant Polymorphic GC-Rich-Repetitive Sequence
Background: Tuberculosis (TB) with more than 10 million new cases per year and one of the top 10 causes of death worldwide, is still one of the most important global health problems. Also, multi drug-resistant tuberculosis (MDR) is a serious danger to public health. Understanding of the epidemiological pattern of mycobacterium tuberculosis (MTB), Estimates of recent transmission and recurrence ...
متن کاملزمانبندی دو معیاره در محیط جریان کاری ترکیبی با ماشینهای غیر یکسان
This study considers scheduling in Hybrid flow shop environment with unrelated parallel machines for minimizing mean of job's tardiness and mean of job's completion times. This problem does not study in the literature, so far. Flexible flow shop environment is applicable in various industries such as wire and spring manufacturing, electronic industries and production lines. After modeling the p...
متن کاملFinding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013